Skip to content

WIP: Better formatting for .set_precision #11667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

mattilyra
Copy link

closes #11656

I found some extra time behind the sofa and started working on this. I made quite a few additions but by default everything should work as it did.

  • just calling .set_precision with an int will truncate all columns to that many decimal places as before
  • precision can be set per column as an extra dict to .set_precision
  • the precision values can be either ints or valid python str.format strings
    • I wanted to add this to allow formats like {:.2e} and possibly some string/date manipulations
    • the .set_precision should perhaps be called set_format (or format_cells) as a result
  • all formatting can be reset by calling .set_precision() (no args)

what's missing?

  • the documentation needs to be updated to reflect these changes (if you're happy with what's here I'll move on to updating the docs)
  • should leave the index alone and just format the data columns
  • tests?
  • support for pd.IndexSlice for subsets?

import itertools
import pandas as pd
import numpy as np

jobs = itertools.product(['a', 'b', 'c', 'd'], np.arange(1e-4, 1e-3, .0003), range(10))
rows = []
for v, v2, itr in jobs:
    rows.append({'param_1': v, 'param_2': v2, 'iter': itr,
                 'score_1': np.random.randint(0, 100, size=(1,))[0],
                 'score_2': np.random.rand(1, )[0]})
df_multi = pd.DataFrame(rows)
agg = df_multi.groupby(by=['param_1', 'param_2'])[['score_1', 'score_2']].agg(['mean', 'sem'])

printing with just the defaults
screen shot 2015-11-20 at 18 56 56

crazy awesome new formatting (forget that the kwarg is called column_formats that's been changed to subsets)

screen shot 2015-11-20 at 18 59 09

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string IO HTML read_html, to_html, Styler.apply, Styler.applymap labels Nov 20, 2015
@jreback jreback changed the title Better formatting for .set_precision Better formatting for .set_precision Nov 20, 2015
@TomAugspurger
Copy link
Contributor

Thanks for putting this up.

A bit busy ATM, but I'll try to review in the next couple weeks. Sorry for the wait!

@mattilyra
Copy link
Author

No worries I've been quite busy as well. There's a related issue from yesterday #11692 - the bottom line all the builtin styles, not just .set_precision should allow defining the column display format.

@@ -144,7 +151,18 @@ def __init__(self, data, precision=None, table_styles=None, uuid=None,
self.table_styles = table_styles
self.caption = caption
if precision is None:
precision = pd.options.display.precision
precision = {'__default__': pd.options.display.precision}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call .set_precision instead of repeating this (and all of this logic should be there).

Further I think that a .set_precision() should reset to the default (e.g. pd.options.display.precision), so pls put in a test for this.

@jreback
Copy link
Contributor

jreback commented Nov 29, 2015

pls add some tests / whatsnew note enhancements / change the html-styling.ipynb to show the features as well.

@TomAugspurger
Copy link
Contributor

@mattilyra going to throw out some partially-formed thoughts here, let me know what you think.

In light of #11692 (Styler not able to take formatters) I think we need a more general solution than what you've got here (but this will be part of that). I think we should determine the display value in the .translate step, and have none (or very minimal) logic in the Template itself. The default display value would be the raw value itself. Instead of

{% for c in r %}  {# for each column in the row #}
<{{c.type}} id="T_{{uuid}}{{c.id}}" class="{{c.class}}">
    {% if c.value is number %}
        {{c.value|round(precision)}}
   {# more elifs for each type of formatting? This will get ugly #}
    {% else %}
        {{c.value}}
    {% endif %}
{% endfor %}

We'd have

{% for c in r %}
<{{c.type}} id="T_{{uuid}}{{c.id}}" class="{{c.class}}">
        {{c.display_value}}  {# <-------- The change is here #}
{% endfor %}

We'd still need to define a nice API for users to map values to display values. Something like what you've got here would be good. And keeping shortcuts like .set_precision will still be useful, as you say.

@TomAugspurger
Copy link
Contributor

Separating the display_value from the value value would also help with wrapping the content in arbitrary tags (links and images for example)

screen shot 2015-12-05 at 9 10 55 am

(I modify the translated context directly here; users would have a method to do the wrapping)

@mattilyra
Copy link
Author

@TomAugspurger I agree that separating the display value makes sense. I don't quite get the point of having .set_precision if the formatting functionality goes into .format. Doesn't .set_precision become redundant in that case?

I think you're second example should already be possible with the changes here, as you can pass arbitrary format strings as the format for each column. The logic is a little unclear in the template and should probably be moved to ._translate.

@TomAugspurger
Copy link
Contributor

See my PR at #11768

Roughly what I envision is that PR will define an API for people to apply custom formatters on a column / subset basis.

I don't quite get the point of having .set_precision if the formatting functionality goes into .format

.set_precision might still be useful as a shorter way of getting the same way. Instead of .set_format(x if not com.is_float(x) else "{:2f}".format(x)), you type .set_precision(2). I'm still not sure how .format / .set_format will work, how multiple formatters will be combined if at all.

@TomAugspurger
Copy link
Contributor

as you can pass arbitrary format strings as the format for each column

I didn't notice that when I first looked through.

Again, this hasn't fully formed in my mind yet, but do you think that .set_precision can be rewritten in terms of #11768? So set_precision will have a nice API for setting different precision levels per column, etc. And the implementation will be modifying the internal dict (I called it _display_funcs) tracking how to format each cell. It doesn't need to know anything about the template or the ._translate step.

@mattilyra
Copy link
Author

I'm still not sure how .format / .set_format will work, how multiple formatters will be combined if at all.

What do you mean multiple formatters?

but do you think that .set_precision can be rewritten in terms of #11768?

Yeah should be possible. Currently .set_precision takes a dict (subset) mapping column names to formats those columns should be displayed as. The format can be either an int (specifying the number of decimal points) or a python format string. This is then handled in the template (to be moved to ._translate).

I hadn't thought that you would call .set_format as .set_format(x if not com.is_float(x) else "{:2f}".format(x)) but as .set_format(a='{:.2f}') (where a is a column name), or perhaps explicitly passing in a dict of column names to format values, as .set_precision currently does (in this PR) for subset. How you would add the formatting for one specific cell I'm not sure, could we use the IndexSlice to allow specifying which column/cell the user wants to specify the format for. That would at least be fairly general. Currently it would be difficult to handle that in the template but moving the display_value logic away from the template would also make that easier.

If we can nail down how this whole thing should roughly work I'm happy to push up a proposal implementation for this.

@TomAugspurger
Copy link
Contributor

I'm still not sure how .format / .set_format will work, how multiple formatters will be combined if at all.

What do you mean multiple formatters?

Nevermind about that, probably a bad idea.

but as .set_format(a='{:.2f}')

Yes, I think you're right. Sorry about that this is all still coming together. The way I see .format working now is it takes a formatter which is either a single value to be applied to all items or a dict of column names: formatter. An individual formatter can be

  • a string s in which case s.format(x) is the display value (x is an individual cell)
df.style.format('{:2f}')  # equivalent to `.set_precision(2)`
  • a callable, in which case c(x) is the display value (is this even necessary)

I hate to make the work you've done here obsolete :/ but it looks like a more general .format / .set_format option is the way to go. Interested in doing that? I have a sketch in https://github.com/pydata/pandas/pull/11768/files for how the translate step will work if you want to implement .format on top of that.

@mattilyra
Copy link
Author

Well it isn't really extra work as that's more or less how .set_precision works in this PR. I just need to transfer the logic of parsing the format value to .translate from the template and write examples.

I think having the option os passing a callable as the formatter is also a good idea, it allows for a lot of flexibility. I think format callable could in that case take also the row and column indices not just the cell values - I can think of use cases where knowing for instance the row value of a MultiIndex becomes quite useful.

I'm happy to keep working on this.

@TomAugspurger
Copy link
Contributor

I just need to transfer the logic of parsing the format value to .translate from the template and write examples.

Feel free to repurpose this PR to take over .format. Are you comfortable enough with git to merge in my branch? Or you can just copy-paste the changes. I've got a probably bad, certainly broken implementation of .format there that you can take pieces from.

For .format I think the signature should be

def format(self, formatter, subset=None):

Where formatter can be either 1. a scalar to be applied to the entire frame (or a subset with subset!=None) 2. a dict of column: formatter.

I think format callable could in that case take also the row and column

The trouble with this is that all styler functions written by users have to accept those as well, even if they don't use them, which could be ok.

@mattilyra
Copy link
Author

Well the user style functions can always take a **kwargs if they don't need the row/column index, it's easier to ignore them when you don't need them than it is to dig them up if they're not given to you.

I think I'll manage with git, merging your branch into this one.

Conflicts:
	pandas/core/style.py
[ci skip]

Conflicts:
	pandas/core/style.py
@mattilyra mattilyra changed the title Better formatting for .set_precision WIP: Better formatting for .set_precision Dec 9, 2015
@TomAugspurger
Copy link
Contributor

@mattilyra Have you had a chance to work on this any more? Any unforeseen obstacles coming up?

I'm going to try and knock out sparse MultiIndices and truncated reprs in the next week or two, and will base it off the changes to the template. Just post here if you see any significant changes that are needed, or if you want to split that template change off into a separate pull request.

@jreback
Copy link
Contributor

jreback commented Jan 11, 2016

@TomAugspurger status of this

@jreback
Copy link
Contributor

jreback commented Mar 12, 2016

closing, but pls reopen if you'd like to update

@jreback jreback closed this Mar 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTML styling set_precision should take subset
3 participants